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Abstract 

iDXi It is known that for memoryless sources, the average and maximal redundancy of 

^ ■ fixed-to-variable length codes, such as the Shannon and Huffman codes, exhibit two 

modes of behavior for long blocks. It either converges to a limit or it has an oscillatory 
pattern, depending on the irrationality or rationality, respectively, of certain parame- 
ters that depend on the source. In this paper, we extend these findings, concerning the 
Shannon code, to the case of a Markov source, which is considerably more involved. 
While this dichotomy, of convergent vs. oscillatory behavior, is well known in other 
contexts (including renewal theory, ergodic theory, local limit theorems and large de- 
viations of discrete distributions), in information theory (e.g., in redundancy analysis) 
it was recognized relatively recently. To the best of our knowledge, no results of this 
type were reported thus far for Markov sources. We provide a precise characterization 
of the convergent vs. oscillatory behavior of the Shannon code redundancy for a class 
of irreducible, periodic and aperiodic, Markov sources. These findings are obtained by 
analytic methods, such as Fourier/Fejer series analysis and spectral analysis of matrices. 
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spectral analysis, analytic information theory. 
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1 Introduction 



Recent years have witnessed a resurgence of interest in redundancy rates of lossless coding, 
see, e.g., [1], [3], [6], [10], [13], [14], [15], [16], [17], [18]. In particular, in [18] Szpankowski 
derived asymptotic expressions of the (unnormalized) average redundancy as a function 
of the block length n, for the Shannon code, the Huffman code, and other codes, focusing 
primarily on the binary memoryless source (BSS), parametrized hy p - the probability of 
'1'. A rather interesting behavior of i?„ was revealed in [18], especially in the cases of 
the Shannon code and the Huffman code: When a = log2[(l — p)/p] is irrational, then Rn 
converges to a constant (which is 1/2 for the Shannon code), as n ^ oo. On the other hand, 
when a is rational, Rn has a non-vanishing oscillatory term whose fundamental frequency 
and amplitude depend on the source statistics in an explicit manner. 

More precisely, confining the discussion to the Shannon code, in [18] the average unnor- 
malized redundancy 

Rn = E{\-l0g^P{Xi,...,Xn)']+l0g2P{Xi,...,Xn)}, (1) 

was analyzed for large n, assuming that the source P, that governs the data to be com- 
pressed, Xi,X2, . . ., is a BSS. A straightforward extension (see also [14]) of the Shannon- 
code redundancy result of [18], to a general r-ary alphabet memoryless source, with letter 
probabilities pi, . . . ,Pr, yields the following expression: 




where P = — logpi, aj = logpj/pi, j = 2,3, ...,r, (u) is the fractional part of a real 
number u (i.e., {u) = u — [u\), and M is the smallest common multiple of all denominators 
of the rational numbers {aj} when presented as ratios between two relatively prime integers. 
This erratic behavior, where Rn is either convergent (and then the limit is always 1/2) or 
oscillatory, depending on the rationality of {aj}, was related in [14] to wave diffraction 
patterns of scattering from partially disordered media, where the existence/non-existence 
of Bragg peaks depends on the rationality /irrationality of certain optical distance ratios. 

Our goal in this paper is to extend the scope of this analysis to irreducible Markov 
sources and to evaluate precisely (for large n) the average redundancy of the Shannon code 
for a finite alphabet, first order Markov source with given transition probabilities. In doing 
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so, we also provide a more complete analysis than in [14] and [18]. As will be seen, this 
extension to the Markov case appears rather non-trivial, both from the viewpoint of the 
conditions for oscillatory behavior and from the aspect of the asymptotic expression of Rn 
in the oscillatory mode. These depend strongly on the dominant eigenvalues and on the 
detailed structure of the matrix of transition probabilities. For example, in contrast to the 
memoryless case, where there is only one oscillatory term, when it comes to the Markov case, 
in the oscillatory mode there are, in general, contributions from multiple oscillatory terms, 
and in the convergent mode, Rn may converge to a constant other than 1/2 (see Example 2 
below). Moreover, it turns out that the behavior of the redundancy depends quite strongly 
on important dynamical properties of the Markov chain, such as reducibility/irreducibility 
and periodicity /aperiodicity. 

We begin our study (Sections 2 and 3) from the relatively simple case where all single- 
step state transitions have positive probability. Our main result in Section 2, Theorem 1, is 
then an extension of formula (2) to the Markov case with strictly positive state transition 
probabilities. To give the reader a general idea of this theorem, an informal description of 
it can be stated as follows: Rather than the parameters {aj} of the memoryless case, we 
now define a matrix {ajk}jk=i log-ratios of certain transition probabilities (the exact 
definition will be provided in the sequel). If at least one of these parameters is irrational, 
then similarly as in the memoryless case, i?„ = ^ + o(l). If, on the other hand, all these 
parameters are rational, then as in the memoryless case, let M be their smallest common 
denominator. In this case, Rn = ^n + o{l), for "most large values" of n (a term that will be 
defined precisely in the sequel), where r2„ is a linear combination of certain functions of n, 
for which we have an explicit formula in terms of the source parameters. These functions 
oscillate as n varies, with amplitude 1/M and a fundamental frequency that depends on 
the source parameters. 

In Section 4, we relax the strict positivity assumption, but still assume the Markov chain 
to be irreducible. Under this assumption, we first assume that the chain is also aperiodic, 
and then further extend the scope to allow periodicity. In these cases, the extension of 
eq. (2) is still available, though it is somewhat less explicit (than in the positive transition 
matrix case) in the sense that it depends on certain parameters of the source, for which we 
have no closed-form expressions, but which can be found by numerical procedures. It is 
also demonstrated (in Example 2) that the irreducibility assumption is essential, since the 



3 



above described two-mode behavior ceases to exist when this assumption is dropped. 

We should point out that minimax redundancy and regret for the class of Markov 
sources were studied in the past - see, e.g., [10], [15]. Interestingly enough, the minimax re- 
gret for memoryless and Markov sources does not exhibit the two-mode behavior of either 
convergent or oscillatory mode [3]. This dichotomy, of convergent vs. oscillatory behav- 
ior, with dependence on rationality/irrationality of certain parameters, is a well recognized 
phenomenon in mathematics and physics, ranging across a large variety of areas, including 
renewal theory, ergodic theory [7], local limit theorems and large deviations for discrete 
distributions [2], [4]. This phenomenon, however, was observed in information theory only 
relatively recently [7], [18]. On the other hand, the oscillatory phenomenon for discrete ran- 
dom structures is a well known fact in analysis of algorithms [5], [19], and also in information 
theory [3], [13], [19]. 

2 Formulation and Results for Positive Transition Matrices 

In this section, we first establish notation conventions and spell out our assumptions. Then, 
we present our main result for the case of a positive transition probability matrix (Theorem 
1), discuss it, and provide an example for its use. 

Throughout this paper, we adopt the customary notation conventions in the information 
theory literature: Random variables will be denoted by capital letters (e.g., X), specific 
values they may take will be denoted by the corresponding lower-case letters (e.g., x), and 
their alphabets will be denoted by the corresponding calligraphic letters (e.g., X). Random 
vectors of length n (e.g., {Xi,X2, . . . , Xn)) will be denoted by capital letters superscripted 
by n (e.g., X"'), and specific values of these vectors (e.g., {xi,X2, ■ ■ ■ ,Xn)) will be denoted 
by lower-case letters superscripted by n (e.g., x"). Finally, the set of vectors of length n, 
with components taking on values in X, will be denoted by A*". Logarithms will always 
be understood to be taken w.r.t. the base 2. The function X(-) will denote the indicator 
function, that is, for a given statement E. T{E) = 1 if £^ is true, and T{E) = if is false. 

Consider a source sequence Xi,X2, . . ., Xt G X = {1, 2, . . . ,r} (r - positive integer), 
t = 1,2,..., governed by a first-order Markov chain with a given matrix P of state- 
transition probabilities {p{j\k)Yj f^^^. The initial state probabilities will be denoted by 
Pk, k = 1,2, . . . ,r. The stationary state probabilities will be denoted by tt^, k = 1,2, . . . ,r. 
Thus, the probability of a given source string x^ = (xi,...,a;„) G X"-, under the given 
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Markov source, is 

n 

li{x") = n P{xt\xt-i). (3) 

t=2 

The average unnormalized redundancy of the Shannon code is defined as 

Rn = E{\- log + log ^(X")}, (4) 

where here and throughout the sequel, E{-} denotes the expectation operator w.r.t. the 
underlying Markov source /x just defined. 

As mentioned in the Introduction, in this paper, we assume that P is irreducible. We 
remind the reader that an irreducible Markov chain is one where there is positive probability 
to pass from every state j G X to every state k ^ X within a finite number of steps, namely, 
for every j and k, there exists a positive integer I such that the (fe,j)-th element of P' is 
strictly positive. Another important concept we will need is periodicity. The period dj of a 
state j is the greatest common divisor of all integers n for which Pr{X„ = j\XQ = j} > 0. 
A state is called periodic if dj > 1 and aperiodic if dj = 1. Since all states of an irreducible 
Markov chain are in the same class of communicating states, then dj is the same for all 
states, and hence will be denoted collectively by d. An irreducible Markov chain is then 
called periodic if d > 1 and aperiodic if d = 1. The case where all entries of P are 
positive, henceforth referred to as the case of a positive matrix P, is obviously a case of 
an irreducible, aperiodic Markov chain. However, the positivity of P is not a necessary 
condition for irreducibility and aperiodicity of a Markov chain. Throughout the remaining 
part of this section, as well as in Section 3, we assume that all entries of P are strictly 
positive. 

Our main result in this section is the following (the proof appears in Section 3). 

Theorem 1 Consider the Shannon code of block length n for a Markov source fj, with a 
a given vector p = (pi, . . . ,Pr) of initial state probabilities and a positive state transition 
matrix P. Define 

'pU\'^)pU\j) 



ocjk = log 



[p{k\l)p{j\k) 

Then, the redundancy Rn is characterized as follows: 
(a) If not all {aj^} are rational, then 



j,ke{l,2,...,r}. (5) 



Rn = l+o{l). (6) 



(b) If all {ctjk} are rational, then for every j,k E {1, . . . , r}, let 

Cjk{n) = M[-{n - 1) logp(l|l) + logp(j|l) - logp(A;|l) - logpj], (7) 

and 

^" = ~ ^) ^ ]^ E E^^i^fe^[OikH], (8) 

where q{u) = \u\ — u and M is the smallest common integer multiple of the denominators 
of {ajk}, when each one of these numbers is represented as a ratio between two relatively 
prime integers. Then, there exists a positive sequence ^„ — >■ 0, which depends only the source 
parameters, such that is upper bounded and lower bounded as follows: 

Rn<^n + ^ i^i^Pj^kAQiCMn)] ^ (Cn, 1 - Cn)} + o{l). (9) 

j=l k=l 

^ ^ - ^ E E Pj^klieiCjki^)] i {in. 1 - in)) - o{\). (10) 

j=\ k=\ 

As a technical comment, it should be pointed out that the choice of the index 1 in the 
conditioning oi p(j\V) and that appear in the definition of a.^^ and in (7), is com- 

pletely arbitrary. One may choose any other index in {1,2,... ,r}, as long as it is the same 
index in both places in the expression of aj^, as well as in the second and third terms in 
the square brackets of (7). Also, p(l|l) in (7) can be replaced independently by p{l^\l) for 
any I G {l,2...,r}. 

Discussion. Theorem 1 tells us that, similarly as in the memoryless case, in the positive 

matrix case, Rn has two modes of behavior. In the convergent mode, which happens 

when at least one is irrational, Rn 1/2. In the oscillatory mode, which happens 

when all {ctjA;} are rational, i?„ oscillates and it asymptotically coincides with J7„ for most 

large values^ of n, provided that logp(l|l) is irrational. This follows from the following 

consideration: If logp(l|l) is irrational, then by Wcyl's equidistribution theorem [12], the 

sequences {Cjkirt-)}n>i are uniformly distributed modulo 1, i.e., they fill the unit interval 

mod 1 with a uniform density as n exhausts the positive integers. Thus, for every fixed 

i: 6[Cjk{n)] ^ 1 ~ for a fraction 2^ of the values of n. This means that for ^„ 0, 

^Thc statement "R„ asymptotically coincides with n„ for most large values of n" means that for every 
e > 0, the fraction of values of n, within the range {1, - . - ,N}, for which \Rn — fin| > e, tends to zero as 

iV ->• GO. 
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the terms I{Q[Qk{n)] ^ 1 — vanish for most large values of n, and then the lower 
bound and the upper bound on Rn asymptotically coincide with n„. If, on the other 
hand, Iog79(l|l) is rational, then Q[Cjk{n)] are periodic sequences. If for none of the values 
n in a period, glCjkin)] = 0, then beyond a certain value of n, ^„ is smaller than the 
minimum value of ^>[Cjfc(^)] along the period and 1 — is larger than the maximum, and 
so, I{g[(jk{n)] ^ {(,n, 1 — Cn)} all vanish for all large n. The expression 



which generates the gap between the upper bound and the lower bound on Rn, can be 
interpreted as an asymptotic approximation of the probability that — log falls in 

the vicinity (within distance 0(^„)) of an integer. For example, when the source is purely 
dyadic (M = 1), then — log/i(X") is integer with probability 1, and indeed, the expression 
in the last display is equal to 1. In this case. Theorem 1 is useless, but it is also redundant, 
because in this case, we clearly know that i2„ vanishes. The reason for this "uncertainty" 
around integer values of — log is that these are the discontinuity points of the func- 

tion g[— log and in the proof of Theorem 1, the function g is expanded as a series of 

trigonometric polynomials whose convergence is problematic in the neighborhood of discon- 
tinuities. Thus, wc believe that the uncertainty in the characterization of i?„ around these 
points should be attributed more to the limitations of the analysis methods than to the real 
behavior of Rn- In other words, we conjecture that, in fact, Rn = fi^ + o(l) for all large 
n, and not just for most large values of n. It should be pointed out that these issues were 
admittedly overlooked in [14] and [18] (beyond the cases of a purely dyadic source, which 
was ruled out in the first place). The essential results therein are nonetheless re-confirmed 
here as a special case, upon carrying out a more rigorous analysis. 

The expression of the oscillatory case, is not quite intuitive at first glance, therefore, 
in this paragraph, we make an attempt to give some quick insight, which captures the essence 
of the main points. The arguments here are informal and non-rigorous (the rigorous proof 
is in Section 3). The Fourier series expansion of the periodic function g is given by 



and the important fact about the coefficients is that they are inversely proportional to 
m, so that for every two integers k and m, am-k = CLm/k. Now, when computing Rn = 




j=i k=i 




(11) 
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E{g[— log let us take the liberty of exchanging the order between the expectation 

and the summation, i.e., 

Rn = l+Yl a„£?{e-'"'"^'°^'^^^")}. (12) 

It turns out that under the conditions of the oscillatory mode, E{e~^^^'^^°^^^-^'^^} tends to 

zero as n ^ oo for all m, except for multiples^ of M, namely, m = £M, I = ibl,ib2, 

Thus, for large n, we have 

= ^ + ^{^.[-MlogMX«)]-i} 



Now, consider the set of all {x"} that begin from state xi = j and end at state Xn = 
k. Their total probability is about PjTTk for large n since X„ is almost independent of 
Xi. It turns out that all these sequences have exactly the same value of g[—M\ogiJ,{x^)], 
which is exactly giCjki'n)] (or, in other words, q[~M log fj,{x^'-)] = £)[Cx-ix„(^t-)] independently 
of X2, ■ ■ ■ ,Xn-i) and this explains the expression of The reason for this property of 
Mlog;u(x")] is the rationality conditions {M ■ Ouv) = 0, u,v G {1,2, ...,r}, which 
imply that {M logp{xt\xt-i)) = {M log\p{xt\l)p{l\l) / p{xt-i\l)]) , and so, 

n 

{~M log ii{x"-)) = {-Mlogpj) +^{-Mlogp{xt\xt-i)) modi 

t=2 

n 

= (-Mlogp,)+^(-Mlog[p(xi|l)p(l|l)Mxi_i|l)]) mod 1(14) 

t=2 

which, thanks to the telescopic summation, is easily seen to coincide with the fractional 
part of Cjk{n), and of course, glCjki^)] depends on Cjfc(^) only via its fractional part. 
Consider next the following example for using Theorem 1. 



Example 1. Consider a Markov source for which the rows of P are all permutations of the 
first row, which is p = {pi, . . . ,Pr)- Now, assuming that aj = log{pi/pj) are all rational, 
let AI be the least common multiple of their denominators (i.e., the common denominator) 



^The convergent mode can be treated as a special case of this statement with M = oo. 
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when each one of them is expressed as a ratio between two relatively prime integers. Then, 

QiCjkin)] = g[- M {n - 1) log p{l\l) + M log p{j\l)-M log p{k\l)-M log pj] 

= Q[-M{n - 1) logpi + M log Pj - Mlogpk - M logpj] 

= Q[-M{n - 1) log pi - M log pk] 

= g{—Mnlogpi + Mlogpi — Mlogpk) 

= Q{-Mnlogpi), (15) 

where in the last step, we have used the fact that (Mlogpi — M log pk) is integer and 
that ^ is a periodic function with period 1. Thus, with the exception of the minority of 
'problematic' values of n, we have 

1 r r 

+ TJ I] I] PjT^kQiCjkin)] + o(l) 

j=lk=l 

= +J^i2pPj''ke{-nMlogpi) + oil) 

If not all aj are rational, then i?„ — )■ 1/2, as predicted by Theorem 1. To see why the 
conditions of Theorem 1 lead to the rationality condition herein, let us denote Ujk = 
(mlog[p(j|l)/p(A;|l)]), and Vjk = {'mlog[p{j\j)/p{j\k)]) . Then, the conditions of Theorem 1 
mean that Uj^+Vj^ = and for all pairs j and k. Therefore, the number of constraints here 
is of the order of r^, whereas the number of degrees of freedom that generate these variables, 
in this example, is r — 1, i,e., the variables {mlog{pi/pj)), j = 2,3,... ,r. Thus, we can 
think of this as an overdetermined set of homogeneous linear equations whose only solution 
is zero, meaning that {mlog{pi/pj)), j = 2,3, . . . ,r, all vanish. Note that the memoryless 
source is a special case of this example, where the rows of P are all identical to the first 
row, {pi, . . . ,Pr)- Indeed, eq. (16) coincides with the expression of the memoryless case (see 
[14], [18] and the Introduction of this paper). 

3 Proof of Theorem 1 
3.1 Introductory Comments 

The main idea behind the analysis of Rn = E{g[— log jj-^X"-)]} is to approximate the pe- 
riodic function q(-) by a sequence of trigonometric polynomials, and then to commute the 
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expectation with the summation and analyze the various terms of the series. For these 
commutations to be legitimate, a sufficient condition is that the convergence would be uni- 
form, but unfortunately, it cannot be uniform since the function q is discontinuous. An 
alternative route that we take is to sandwich q between two continuous periodic functions, 
Qq and both with period 1, and both indexed by some parameter 0, which when tends 
to zero, the bounds become tighter and tighter. Fejer's theorem (see, e.g., [20]), which 
is the trigonometric version of the Weierstrass theorem, provides a concrete sequence of 
trigonometric polynomials, which converges uniformly to any given periodic function which 
is continuous. The program of the proof is to apply Fejer's theorem to Qq, and Qq , and use 
them to obtain sandwich bounds on 

3.2 Preliminaries of the Proof 

Define the function Qq as 

W - I 1 _ ^ < (n) < 1 ^^^> 

and 

QU'^) = Q9i'^) + ^e{u) (18) 
1 _ M {)<{u)<e 

Ae(«) = <( e< \u) <i-e (19) 

l{{u) + e-i) i-e<{u)<i 

Obviously, Qgiu), and £"^('u) are continuous, periodic functions, with period 1, and Qq{u) < 
q{u) < Q^{u) for every u. Now, Qg and Ag have the following Fourier representations: 



where 



and 



Note that for any given integers k and i, 



and similarly 



a,m = ^ (22) 



b,m = MM) . (23) 
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These identities will be important later on, in order to return from the series expansions 
back to the original functions. The N-the order Fejer approximations are given by 

= 5+1; i'-WTi) """"" 

\m\=l ^ -r ±/ 

and ^ 

{Ao{u)}^ = 9+Y: bm{e) ■ (1 - ^) e2~. (25) 

|m|=l ^ 



According to Fejer's theorem, as A/" — )■ 00, these functions converge uniformly to Qq{u) and 
Ag('u), respectively. However, it should be kept in mind that in order to guarantee that the 
absolute error would be uniformly within less than a given e (for all three functions p^, pg, 
and Ae), the integer N should be at least as large as some iVo(e, 6*) (or Nq for shorthand 
notation), which grows both as e decreases and as 9 decreases. In particular, following the 
proof of Fejer's theorem [20, p. 6] (see also Appendix herein), it is readily seen that for all 
three functions, pj", pg, and Aq, 



eo(N,e)= inf 

0<5<l/2 



6 1 

+ 



(26) 



is an upper bound on the maximum approximation error when N terms of the Fejer series 
are used. Thus, Nq{€,9) can be defined as the smallest integer A^ such that €q{N,9) < e. 
Obviously, by definition 

eo[A^o(e,^),^] <e- (27) 
We will make use of this simple inequality later on. 

3.3 General Lower and Upper Bounds on 

We proceed with some general lower and upper bounds on R^. As for the lower bound, we 

have 

Rn = £;{^(-iogM^"))} 

> E{e,{-\og^,{x-))} 

1 ^ ... f. \m\ 



|m|=l 

No 



I + E «-(^) • (1 - ]v^) E {e-^— - e. (28) 



\m\=l 
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Now, clearly 

n 

^?{e-2'^'™^°s'^(^")}= nb(^tl^*-i)e^P{-2™logP(^tkt-i)}]- (29) 

x<^x^ t=i 

Define the r x r complex matrix Ar^n whose entries are 

ajk{m) = p{k\j) exp [-27rim \ogp{k\j)] , j, = 1, . . . , r. (30) 
Also define the r-dimensional column vectors 

Cm = {piexp[-2Trim\ogpi)],...,Prexp[-2Trim\ogpr])'^, (31) 

and 1 = (1, 1, ... , 1)^, where the superscript T denotes vector/matrix transposition. Then, 
it follows that 

E {e-2^^'"l°gM(^")} = c^y4;^-4. (32) 

Let Ij^m and „ be, respectively, the left eigenvector and the right eigenvector pertaining 
to the eigenvalue Xj^m (i = !> 2, . . . , r) of the matrix Am- Here, we index the eigenvalues of 
Am according to a non-increasing order of their modulus, that is, 

\^l,m\ > \^2,m\ > ■ ■ ■> \^r,m\- (33) 

Since P is a stochastic matrix (so, its maximum modulus eigenvalue is 1) and its elements 
are the absolute values of the corresponding elements of Am, it follows from [8, Theorem 
8.4.5] (see also Lemma 1 in Subsection 3.4) that |Ai,m| < 1 (and hence \Xj^m\ < 1 for all 
j = 1,2, ... ,r). Also, the sets of left- and right eigenvectors form a bi-orthogonal system, 
i.e., i^^mfk^m = 0, j,k = 1,2, ... ,r, j ^ k. We scale these vectors such that i^^m'^j,m = 1 for 
all j = 1, 2, . . . , r. Then by the spectral representation of matrices [8], we have 

r 

and so, 

<^m^m 1 — ^ '^j,m ' ^j,m^ ' ^mT3,'m- (35) 

On substituting this back into the lower bound on we obtain: 

1 Wo / I I \ 
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In a similar manner, we obtain the following upper bound 



Rn = E{g{- log fi{X^))} 
< E{g+{- log f,{X-))} 

= E {g-i-logfi{Xn)} + E{Ae{-logfi{X^))} 



|m| 

= 2 + ^ + ^ t'^-^^) + • - iVo+l) ■ ^'^^^^ ■ ''^''^■'"^ 

+e. (37) 

Let us define now 

7„(6,0) ^ Y.[\a^^B)\ + \bmm]-{^-^^) E |A,-,rnr^KLl-c^r,-^| 

l™.|=l ^ ^^0^^^ j: |A,-^|<1 

+e + ^. (38) 

and recall that Nq depends on e and 9. Obviously, for every fixed e and 9, the double 

sum over m and j, in the expression of 7n(e,0), tends to zero as n — >■ cxd since all terms 

contain a factor \\j^mY''^'^ and by definition of these terms, only \Xj-rn\ < 1 are included in 

the summation. This means that if we let e and 9 tend to zero slowly enough with n, thus 

denoting them by e„ and 9n, we have 7n(en)^n) ~^ 0- In particular, let us define e„ and 0„ 

to be the minimizers^ of 7„(e, 0). Then, obviously, 7„ = 7n(en)^n) ^ as n — >■ cxd. Then, 

our upper and lower bounds become 

1 ^0 / \m\ \ 

Rn>^+Y: amiOn) ■ U - ■ E " ^Im^ ' ^l^i,m ' 7n, (39) 

|m|=l ° j: |Aj,^|=l 



and 



^ H=l ^ ^^0 + -^^,: |A,,^|=1 

3.4 Criteria for the Convergent and Oscillatory Modes 

Considering the derived lower bound and the upper bound on (eqs. (39) and (40), it 
is apparent that the key issue that distinguishes between the convergent mode and the 
oscillatory mode of Rn, is to determine under what conditions the modulus of the dominant 



^Note that with this choice, 9n and tn depend only on the parameters of the source ii. 
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eigenvalue, Ai^^) namely, the spectral radius of Am, denoted p{Am), is equal to unity and 
under what conditions it is strictly less than unity (obviously, it cannot be larger than unity) . 
The former case is the oscillatory mode and the latter case is the convergent one. To this 
end, the following lemma, that appears in [8] (with minor modifications in its phrasing), 
and that has already been used in earlier related studies [9], [11], proves useful. 

Lemma 1 [8, Theorem 8.4-5, p. 509] Let F = {fkj} and G = {gkj} be two r x r matrices. 
Assume that F is a real, non-negative and irreducible matrix, G is a complex matrix, 
and fkj > \gkj\ for all k,j € {1,2, ... ,r}. Then, p{G) > p{F) with equality if and only 
if there exist real numbers s, and wi,...,Wr such that G = e^^^^DFD~^ , where D = 
diag{e^'^^'"\...,e^'^^'"-}. 

The proof of the necessity of the condition G = e'^^^'^DFD~^ appears in [8] (see also [9], 
[11]). The sufficiency is obvious since the matrix DFD~^ is similar to F and hence has the 
same set of eigenvalues. 

We wish to apply Lemma 1 in order to distinguish between the two aforementioned cases 
concerning the spectral radius of Am- Consider the state transition probability matrix P in 
the role of F of Lemma 1 (i.e., fkj = p{j\k)) and the matrix Am in the role of G. Since P 
is assumed positive in this part, then it is obviously non-negative and irreducible. Since it 
is a stochastic matrix, its spectral radius is, of course, p(^P^ — 1. Also, by definition of 
as the matrix {p{j\k) ■ e^p[—2Trim log p{j\k)]}, it is obvious that the elements of P are the 
absolute values of the corresponding elements of Am, and so, all the conditions of Lemma 
1 clearly apply. The lemma then tells us that p{Am) = p{P) = 1 if and only if there exist 
real numbers s and wi, . . .Wr such that: 

-mlogp{j\k) = (s + Wk-Wj) modi, j,k = l,...,r, (41) 

where x = y mod 1 means that the fractional parts of x and y are equal, that is, (x) = (y). 

To find a vector w = {wi, . . . , Wr) and a number s with this property (if exist), we take 
the following approach: Consider first the choice k = j in (41). This immediately tells us 
that s, if exists, must be equal to —m\ogp{j\j) (mod 1) for every j = 1, . . . ,r. In other 
words, one set of conditions is that —m\ogp{j\j) are all equal (mod 1), or equivalently, 

("'^°^OTy)=°' i = 2,3,...,r, (42) 
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and then s is taken to be the common value of all {—m\ogp{j\j)). Thus, eq. (41) becomes 

mlog^^ = {wk - Wj) mod 1, j,k = l,...,r, (43) 

and it remains to find the vector w if possible. To this end, observe that if w satisfies (43), 
then for every constant c, w + c also satisfies (43). Taking c = —wi,^ it is apparent that if 
(43) can hold for some w, then there is such a vector whose first component vanishes, and 
then by setting A; = 1 in (43), we learn that 

^.• = ("^l°g^)' J = ^^---^r, (44) 
is a legitimate choice. Thus, (43) becomes 

1 \ P{j\'^)p{j\j) 
mlog 



Note that by setting A; = 1 in (45), we get (42) as a special case, which means that (45), 
applied to all j, k G {1,2,..., r}, are all the necessary and sufficient conditions needed for 
p{Ajn) = 1. Now, a necessary and sufficient condition for eq. (45) to hold for some integer 
m, is that the numbers 



ajk = log 

would be all rational. 



p(i|i)p(j|j) 

.p{k\\)p{j\k) 



(46) 



We next prove the asymptotic expressions for first, for the case where some {(Xjk} 
are irrational, which means that p{Ajn) < 1 for all m 7^ (convergent mode), and then for 
the case where all {ajk} are rational, which means that there are non-zero values of m for 
which p{Ajn) = 1 (oscillatory mode). 

3.5 Bounds on i?„ in the Convergent and Oscillatory Modes 

When some aj^ are irrational, then for all m 7^ and j G {1,2, .. . ,r}, we have \Xj,m\ < 
1, and so, the second terms (i.e., the sums over m) in eqs. (39) and (40) do not exist. 
Consequently, we immediately get Rn > 5 — 7n and Rn < | + 7n, namely, = 5 + o(l). 

Consider now the case where all {cxjk} are rational, and so, there exist m 7^ with 
p{Ajn) = 1. Our first step is to establish the fact that if M is the smallest positive integer 
m that satisfies (45) , then any other non-zero integer m satisfies this property if and only if 
it is an integral multiple of M. The fact that integer multiples of M satisfy (45) is obvious 



The choice of the first component of w is arbitrary. 
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since {k ■ Majk) = {k ■ {Majk)) = {k ■ 0) = 0. To see why the converse is true as well, 
let M' be another integer satisfying (45). If M' is not an integer multiple of M, it must 
be larger than M since M was defined as the smallest integer satisfying (45). Now, if M 
and M' both satisfy (45), then so does M" = M' - [M'/MJ • M, but M" must be strictly 
smaller than M, which is a contradiction. 

This means that for m = £M, £ = zbl,zb2,..., and only for these integers, Aj^ has a 
modulus 1 eigenvalue 

Xi,m = exp [27ri {-iM log p{l\l))] = exp [-27riiM log p{l\l)] (47) 

and the corresponding vector tu is ^ times (mod 1) the vector w associated with m = 
M. By the Perron-Frobenius theorem [8], all other eigenvalues have modulus strictly 
less than 1, and they will contribute exponentially small terms to Since X[\j^Aim 
is similar to P, under the transformation matrix D = diag{e^'^*"'i , . . . , e^'^*"'''}, Wj = 
{£Mlog\p{j\l)/p{j\j)]), j = 1,2, ...,r (see Lemma 1), then by (44), the right- and left 
eigenvectors associated with Xi^eM are, respectively, 

nm = D -1= (l, e2'r^<^^log[p(2|l)/p(2l2)]^ _ _ _ ^ ^2nimiog\pir\l)/pir\r)]Y ^ (43) 

and 

ll,m = (TTl, ...,7Tk)-D-' = (;ri,7r2e-2-^^'°gW2|l)/p(2|2)]^ _ _ _^^^^-2.imio^\pir\l)/pir\r)]-^ 

(49) 

Thus, the dominant term in c^m^Im ^ becomes: 

KiM ■ lleM^ ■ cjMri,eM = ^p.-Tr.e^-^^^'^H, (50) 

where Cjfe(^) is defined as in Theorem 1. Combining this relation with eq. (39), Rn is further 
lower bounded as follows: 

lNo/M\ 



Rn > 1+ Y: a,M(^„).(l--|^).5:;,,7r,e^-^C^-W-7n 

Y ■ (l - i^) ■ Ep....-*""' -7, 



W-- 

1 1 / \f\ \ 

2 M V [No/M\+lJ f/-' 
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\.No/M\ 



M 



XNo + l)/M [A^o/MJ+1 



In 



lNo/M\ 



M 



\e\ 



> 



{No + 1)/M L^o/Mj + 1 



j,k 



j,k 



[No /Mi 



In 



j,k 



\i\=l 



{No + 1)/M lNo/M\ + 1 



M' 



(51) 



where r)n is defined as the maximum approximation error of the function q]^0^^ using 
[No{€n,On)/M \ terms of the Fcjcr scries. Wc wish to show now that r]n as n —?■ oo. 
Let us assume that and On arc small enough to make A'^o = NQ{€n,9n) not smaller than 
2M, and so, [No/M\ >Nq/M-1> No/2M. Then, using eq. (26), 

'No{en,dn) 



Vn < eo 



2M 



= inf 

0<5<l/2 

< inf 

0<(5<l/2 



6 



-,Men 



+ 



2M 



MOn No{en,en)sm'{7r6) 



2M5 



+ 



2M 



9n ' No{en,en)sm\Tr5) 
= 2M ■eo[Noien,9n),en] 

< 2Me„ -J- 0, 



(52) 



where the last inequality follows from eq. (27). Thus, r]n/M in the last line of (51), is upper 
bounded by 2e„. The first two terms in the last expression of (51) form Qn, as defined in 
Theorem 1. Now, for the absolute value of the fourth term, it is first observed that upon a 
standard algebraic manipulation under the assumption Nq > 2M, we have 



{No + 1)/M [No/M\ + 1 



< 



M\{No/M) + 1/M -1\ 
(iVo + l)(L^o/Mj +1) 

2M2 



(53) 



Thus, the fourth term of (51) is upper bounded by the weighted sum (with weights pjiTk 
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for each pair {j, k)) of terms, that are bounded as follows: 



-, No/M 

\e\=i 



\i\ \e\ 



(7Vo + l)/M 7Vo/M + l 



4^2 [No/Mi 

lNo/M\ 



2M 2[l-cos(2^^Mg„)] 
TTiV^ V (27r£Me„)2 ■ ^'^^ 

To bound the summand of the last expression, consider the following: For every positive t, 
clearly, sint < t, and so, for every a > 0, 

1 — cos a = / sinMt < / tdt = —, (55) 
Jo Jo 2 

which for a = 2TriM9n, implies that the summand is bounded by 1, and hence the expression 
in the last chain of inequalities is further upper bounded by Sn = 2/(ttNq). Since Nq = 
No{en, On) then Sn — >■ 0, and we have 

^ - \\~Jd)'^^^ PjT^kQiCjkin)] - Y^pjirkAMOn [Cjk{n)] - Jn - 2e„ - Sn 

j,k j,k 
j}k 

^ ^Pj7rkI{g[Cjk{n)] ^ (MOn, 1 - MOn)} - in - 2en - Sn, (56) 

j,k 

and so, the lower bound of Theorem 1 is obtained with ^„ = Mdn- In the very same manner, 
the upper bound on Rn is given by 

^ - \{^~Jll^ J^^Pj^'kQMejCjkin)] + 

j,k 

^PjT^k^Men [Cjkin)] + 7n + 2e„ + Sn (57) 

j,k 



- l{^-Jll + jiT.Pj^''e[Qk{n)] + 



j,k 

^Pj^kI{Q[Cjkin)] ^ (4„, 1 - U)} + 7n + 2e„ + Sn, (58) 

j,k 

which is the upper bound of Theorem 1. Here, one has to bound also an expression similar to 
(54), but with ai{Me n) being replaced by b^i^MOn), and the bounding technique is similar. 
This completes the proof of Theorem 1. 
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4 Extensions 



We now discuss some extensions of Theorem 1. In particular, we drop the assumption that 
all transition probabilities must be strictly positive and first assume that P corresponds to 
an irreducible aperiodic Markov source. Then we drop the aperiodicity constraint. 

4.1 Irreducible Aperiodic Markov Sources 

When some of the entries of the matrix P vanish, then obviously. Theorem 1 cannot be used 
as is since the corresponding parameters ajk are no longer well defined. Lemma 1, which 

stands at the heart of the proof of Theorem 1, can still be used as long as P is irreducible, 
but more caution should be exercised. The key issue is still to determine whether there 
exist parameters s and w (and to find them if exist) that satisfy 

—m\ogp{j\k) = {s + Wk — Wj) mod 1, (59) 

but now these equations are imposed only for the pairs (j, k) for which p{j\k) > (as for the 
other pairs ajk{m) = p{j\k) = satisfy the conditions of Lemma 1 automatically anyway). 
The approach taken in the solution for s and w, that was derived in the first part of Section 
3, can still be applied, with some minor modifications, as long as at least some particular 
subsets of the entries of P are still positive. 

For example, if one or more diagonal element of P is positive, and for all positive p{j\j), 
the numbers {—mlogp{j\j)) are equal, then s can still be taken to be the common value of 
all these numbers. If, in addition, at least one row of P is strictly positive, say, row number 
I, then Wj can be taken to be {m\og\p{l\l) /p{j\l)]) , and then the rationality condition of 
Theorem 1 is replaced by the condition that 

pumm 



a'jk = log 



(60) 



.PiW)pU\k). 

must be rational for all {j,k) with p{j\k) > 0. The bounds on i?„ in the oscillatory mode 
would be exactly as in Theorem 1, but with the above assignments of s and w. 

For a general non-negative matrix P, however, it may not be a trivial task to determine 
whether equations (59) have a solution, and if so, what this solution is. In fact, it may be 
simpler and more explicit to check directly if has an eigenvalue on the unit circle (which 
thereby dictates s) and then to find w using Lemma L This would lead to the following 
generalized version of Theorem 1. 
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Theorem 2 Consider the Shannon code of block length n for an irreducible aperiodic 
Markov source. Let M be defined as the smallest positive integer m such that 

p{Am) = \Xl,m\ = 1 (61) 

and as M = CO if (61) does not hold for any positive integer m. Then, Rn is characterized 
as follows: 

(a) If M = oo, then 

Rn = l+o{l). (62) 

(b) If M < oo, then the bounds of Theorem 1, part (b), hold with Cjki^) being redefined 
according to 

Cjk{n) = M[{n - l)s + Wj - Wk - logpj], (63) 

where 

s = ^^MW (64) 
27r 



and 

_ arg{xj} 



, j = l,2,...,r, (65) 



27r 

Xj being the j-th component of the right eigenvector x of Am, which is associated with the 
dominant eigenvalue Xi^m- 

The proof of Theorem 2 is very similar to that of Theorem 1, and hence we will not provide 
it here. In a nutshell, we observe that the Perron-Frobenius Theorem and Lemma 1 are 
still applicable. Then, we use the necessity of the condition Am = e^'^^^DPD"^ and the 
fact that once this condition holds, the vector x = D 1 = (e^'^'^'i, . . . , e^'^'"''')^ is the right 
eigenvector associated with the dominant eigenvalue Ai^m = e'^'"''^. 

Unfortunately, Theorem 2 docs not suggest a practical way to find M. One must start 
with m = 1, check if p{Ai) = 1; if not - increment m to 2, check p{A2), and so on. In 
the event that M = oo, we do not have a stopping rule and we may keep incrementing m 
indefinitely. An interesting point to note, however, is that the oscillatory expression goes 
to 1/2 when M grows without bound. This means that given the block length n, it is 
sufficient to stop incrementing m at some m(n), where m(n) is an arbitrary function that 
grows (and no matter how slowly) with n. This is because the oscillatory expression will 
then be 1/2 + o(n) anyway, just like the convergent expression, so the distinction between 
the two modes looses its meaning. 
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Finally, it is instructive to demonstrate an example of a reducible Markov source, for 
which Theorems 1 and 2 do not hold, and see that even in a simple situation (r = 2), once 
the irreducibility assumption is dropped, the two-mode behavior, predicted by Theorems 1 

and 2, disappears. Thus, the point in Example 2 below is that the irreducibility assumption 
is not imposed just for technical convenience. It is actually essential for Theorems 1 and 2 
to hold. 

Example 2. Reducible Markov source. Consider the case r = 2, where p(l|2) = and 
a=p(2|l) € (0,1), i.e., 

P=(l-« «). (66) 

Assume also that pi = 1 and p2 = 0. Since this is a reducible Markov source (once in state 
2, there is no way back to state 1), we cannot use Theorems 1 and 2, but we can still find 
an asymptotic expression of the redundancy in a direct manner: Note that the chain starts 
at state '1' and remains there for a random duration, which is a geometrically distributed 
random variable with parameter (1 — a). Thus, the probability of k I's (followed hy n — k 
2's) is about (1 — a)*^ • a (for large n) and so the argument of the function g{-) should be 
the negative logarithm of this probability. Taking the expectation w.r.t. the randomness of 
k, we readily have 

oo 

Rn = Y,a{l- a)^^[- log a - A: log(l - a)] + o(l). (67) 

k=0 

We see then that there is no oscillatory mode in this case, as i?„ always tends to a constant 
that depends on a, in contrast to the convergent mode of Theorems 1 and 2, where the limit 
is always 1/2, independently of the source statistics. To summarize, it is observed that the 
behavior here is very different from that of the irreducible case, characterized by Theorems 
1 and 2. 

4.2 Irreducible Periodic Markov Sources 

Consider now an irreducible periodic Markov source. The Perron-Frobenius theorem and 
Lemma 1 still hold [8]. However, the matrix P now has d eigenvalues on the unit circle, 

namely, all the d-th roots of unity [8], where d is the period, i.e., 

^/^g27rit/d^ t = 0,l,...,d-l. (68) 
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Let rt and It be the right- and the left eigenvectors of P that are associated with Aj. The 
analysis is similar as in the aperiodic case, except that we now have d oscillatory terms, one 
for each eigenvalue on the unit circle. Indeed, suppose that for some m, the matrix has 
a modulus-1 eigenvalue A = e^'^**. Then, of course, 

A'^ = (69) 

has eigenvalue 1. By definition, the entries of P are still the absolute values of the corre- 
sponding entries of A'^, as in Lemma 1. Thus, by this lemma, A'^ is similar to P, and so it 
has the same eigenvalues as P. Among them, the d-ih roots of unity X^, t = 0,1, . . . ,d — 1 
are eigenvalues of A'^. Therefore, Aj^ has the following eigenvalues on the unit circle: 

At,m = e"''^'+*/'^\ t = 0,l,...,d-l. (70) 

Let us relabel, if necessary, the eigenvalues of A^ such that s G [0, 1/c?). This means that 
the definition of s in Theorem 2 should be restricted to the half open interval [0, 1/d). Thus, 
Theorem 2 holds except that Cjfe(^) ^'^s replaced by 



Cjktin) = M 



j,ke{l,2,...,r}, t = 0,l,...,d-l 

(71) 

and the double summations over (j, k) with weights pjTTk, are replaced by corresponding 
triple summations over {j,k,t) with weights Pjrtjlt,k, where It^k is the A;-th component of 
If and rtj is the j-th component of rt- Note that roj = 1 and Zo,fc = vr^, so for d = 1 we 
indeed obtain the expression (63) of the aperiodic case as a special case. 

Appendix 

In this appendix, we establish the relation (26). As is shown in [20], the coefficients of 
the N-th order Fejer series expansion, {/(n)}jv, of a general periodic function f{u), with 
period 1, are given by the Fourier coefficients fm multiplied by the "triangular window" 

1 — |m|/(iV + 1). This means that in the original n-domain, the reconstruction {f{u)}]\f is 
given by the convolution between f{u) and the kernel 

Kn{u)= f (i^JpL)e-^irnu^ sin^m + l)jru] 
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Since J^il2 -f^Ar(u)dn = 1, we have 

|/(n) - mu)u\ 



f{u)~ I dtf{u-t)KN{t) 

/■+V2 

/ dt[f{u) - f{u - t)]KN{t) 
J-1/2 

< r^'\t\f{u)- f{u-t)\-KN{t) 

J-1/2 

= / At\f{u)- f{u-t)\-KM{t) + 

J\t\<5 

[ dt\f{u)-f{u-t)\-KN{t) (A.2) 

for every 5 G (0, 1/2). Now, in our case, for all three functions, |t| < 6 implies \f{u) — f{u~ 
t)\ < S/9, since the maximum absolute slope of all three of them is 1/9. Since Kiy{t) > 

1 /2 

and J_y2dtKN{t) = 1, the first integral in the last hne is bounded by 6/9. As for the 
second integral, in our case, — f{u — i)| < 1 for all three functions. Since the sine 

function is monotonically increasing in the range [0,7r/2], then 1/2 > \t\ > S implies 



Thus, for every 6 G (0, 1/2), 



5_ 1_ 

9 ^ Nsin^iirS) 



\f{u) - {fiu)U\ < ^ + ^^T-^TTTT (A-4) 



and eq. (26) is obtained upon minimizing the r.h.s. over the free parameter S. 
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